Speech/Nonspeech Segmentation in Web Videos
نویسنده
چکیده
Speech transcription of web videos requires first detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as YouTube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classifier, which together yield a lower frame error rate (25.3%) on YouTube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.
منابع مشابه
Do infants detect a-v articulator congruency for non-native click consonants?
In a prior study infants habituated to an audio-only labial or alveolar, native English voiceless or non-native ejective stop, then saw silent videos of stops at each place [1]. 4-month-olds gazed more at congruent videos for native and non-native stops. 11-month-olds preferred congruence for native stops but incongruence for non-native ejectives, suggesting language experience biases but does ...
متن کاملFast lip tracking for speech/nonspeech detection
spoken language systems Saarland university 66041 Saarbrücken An efficient speech/nonspeech detection is an important part of any speech recognition system. It allows a good estimation of the background noise, which can be used for noise cancellation techniques like spectral subtraction. Furthermore it avoids the activity of the speech recognizer on unwanted segments of the audio stream. Recent...
متن کاملSegregation of unvoiced speech from nonspeech interference.
Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating...
متن کاملAudio Segmentation using Line Spectral Pairs
This paper describes a technique for unsupervised audio segmentation. Main objective of the work presented in this paper is to study the performance of audio segmentation system using metric-based method. The system first classifies the audio signal into speech and nonspeech signal using variance of zero crossing rate. The feature Line spectral pair is used for automatically detecting the speak...
متن کاملContent analysis for audio classification and segmentation
In this paper, we present our study of audio content analysis for classification and segmentation, in which an audio stream is segmented according to audio type or speaker identity. We propose a robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence. Audio classification is processed in two steps, which makes it suitable ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012